Search CORE

1,331 research outputs found

Efficient algorithms for conditional independence inference

Author: Bouckaert Remco R.
Hemmecke Raymond
Lindner Silvia
Studený Milan
Publication venue
Publication date: 01/01/2010
Field of study

The topic of the paper is computer testing of (probabilistic) conditional independence (CI) implications by an algebraic method of structural imsets. The basic idea is to transform (sets of) CI statements into certain integral vectors and to verify by a computer the corresponding algebraic relation between the vectors, called the independence implication. We interpret the previous methods for computer testing of this implication from the point of view of polyhedral geometry. However, the main contribution of the paper is a new method, based on linear programming (LP). The new method overcomes the limitation of former methods to the number of involved variables. We recall/describe the theoretical basis for all four methods involved in our computational experiments, whose aim was to compare the efficiency of the algorithms. The experiments show that the LP method is clearly the fastest one. As an example of possible application of such algorithms we show that testing inclusion of Bayesian network structures or whether a CI statement is encoded in an acyclic directed graph can be done by the algebraic method

CiteSeerX

Research Commons@Waikato

Accuracy bounds for ensembles under 0 - 1 loss.

Author: Bouckaert Remco R.
Publication venue: Dept. of Computer Science
Publication date: 01/01/2002
Field of study

This paper is an attempt to increase the understanding in the behavior of ensembles for discrete variables in a quantitative way. A set of tight upper and lower bounds for the accuracy of an ensemble is presented for wide classes of ensemble algorithms, including bagging and boosting. The ensemble accuracy is expressed in terms of the accuracies of the members of the ensemble. Since those bounds represent best and worst case behavior only, we study typical behavior as well, and discuss its properties. A parameterised bound is presented which describes ensemble bahavior as a mixture of dependent base classifier and independent base classifier areas. Some empirical results are presented to support our conclusions

CiteSeerX

Research Commons@Waikato

Recherches biostratigraphiques dans quelques coupes du Famennien de l'Avesnois (Nord de la France)

Author: Bouckaert J.
Dreesen R.
Drijkoningen P.
Publication venue
Publication date: 01/01/1978
Field of study

Conodonts and Goniatites from four "old" famennian sections in the Avesnois (France) have been carefully studies. For the first time, the biostratigraphic position of these sections is determined

Open Marine Archive

Stratigraphic interpretation of the Tohogne borehole (province de Luxembourg). Devonian - Carbinoferous transition

Author: Bouckaert J.
Conil R.
Dusar M.
Publication venue
Publication date: 01/01/1978
Field of study

The Tohogne borehole section, from the Lower Tournaisian into the Upper Famennian, has a remarkable micropalaeontological content (conodonts, foraminifers, spores) which enabled a detailed subdivision of these strata. New data in biostratigraphy and systematic palaeontology and palaeogeographic implications are presented, as well as correlations with reference sections

Open Marine Archive

Phylogeographic analysis of the Bantu language expansion supports a rainforest route

Author: Blasi D.
Bouckaert R.
Gray R.
Greenhill S.
Koile E.
Publication venue: 'Proceedings of the National Academy of Sciences'
Publication date: 01/08/2022
Field of study

The Bantu expansion transformed the linguistic, economic, and cultural composition of sub-Saharan Africa. However, the exact dates and routes taken by the ancestors of the speakers of the more than 500 current Bantu languages remain uncertain. Here, we use the recently developed “break-away” geographical diffusion model, specially designed for modeling migrations, with “augmented” geographic information, to reconstruct the Bantu language family expansion. This Bayesian phylogeographic approach with augmented geographical data provides a powerful way of linking linguistic, archaeological, and genetic data to test hypotheses about large language family expansions. We compare four hypotheses: an early major split north of the rainforest; a migration through the Sangha River Interval corridor around 2,500 BP; a coastal migration around 4,000 BP; and a migration through the rainforest before the corridor opening, at 4,000 BP. Our results produce a topology and timeline for the Bantu language family, which supports the hypothesis of an expansion through Central African tropical forests at 4,420 BP (4,040 to 5,000 95% highest posterior density interval), well before the Sangha River Interval was open

MPG.PuRe

Efficient estimation of AUC in a sliding window

Author: A Bifet
C Ferri
D Brzezinski
DJ Hand
I Žliobaitė
J Gama
J Gama
J Gama
Remco R. Bouckaert
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/02/2019
Field of study

In many applications, monitoring area under the ROC curve (AUC) in a sliding window over a data stream is a natural way of detecting changes in the system. The drawback is that computing AUC in a sliding window is expensive, especially if the window size is large and the data flow is significant. In this paper we propose a scheme for maintaining an approximate AUC in a sliding window of length

k

. More specifically, we propose an algorithm that, given

\epsilon

, estimates AUC within

\epsilon / 2

, and can maintain this estimate in

O((\log k) / \epsilon)

time, per update, as the window slides. This provides a speed-up over the exact computation of AUC, which requires

O(k)

time, per update. The speed-up becomes more significant as the size of the window increases. Our estimate is based on grouping the data points together, and using these groups to calculate AUC. The grouping is designed carefully such that (

i

) the groups are small enough, so that the error stays small, (

ii

) the number of groups is small, so that enumerating them is not expensive, and (

iii

) the definition is flexible enough so that we can maintain the groups efficiently. Our experimental evaluation demonstrates that the average approximation error in practice is much smaller than the approximation guarantee

\epsilon / 2

, and that we can achieve significant speed-ups with only a modest sacrifice in accuracy

arXiv.org e-Print Archive

Crossref